16 research outputs found

    New Datasets, Models, and Optimization

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์†ํ˜„ํƒœ.์‚ฌ์ง„ ์ดฌ์˜์˜ ๊ถ๊ทน์ ์ธ ๋ชฉํ‘œ๋Š” ๊ณ ํ’ˆ์งˆ์˜ ๊นจ๋—ํ•œ ์˜์ƒ์„ ์–ป๋Š” ๊ฒƒ์ด๋‹ค. ํ˜„์‹ค์ ์œผ๋กœ, ์ผ์ƒ์˜ ์‚ฌ์ง„์€ ์ž์ฃผ ํ”๋“ค๋ฆฐ ์นด๋ฉ”๋ผ์™€ ์›€์ง์ด๋Š” ๋ฌผ์ฒด๊ฐ€ ์žˆ๋Š” ๋™์  ํ™˜๊ฒฝ์—์„œ ์ฐ๋Š”๋‹ค. ๋…ธ์ถœ์‹œ๊ฐ„ ์ค‘์˜ ์นด๋ฉ”๋ผ์™€ ํ”ผ์‚ฌ์ฒด๊ฐ„์˜ ์ƒ๋Œ€์ ์ธ ์›€์ง์ž„์€ ์‚ฌ์ง„๊ณผ ๋™์˜์ƒ์—์„œ ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ์ผ์œผํ‚ค๋ฉฐ ์‹œ๊ฐ์ ์ธ ํ™”์งˆ์„ ์ €ํ•˜์‹œํ‚จ๋‹ค. ๋™์  ํ™˜๊ฒฝ์—์„œ ๋ธ”๋Ÿฌ์˜ ์„ธ๊ธฐ์™€ ์›€์ง์ž„์˜ ๋ชจ์–‘์€ ๋งค ์ด๋ฏธ์ง€๋งˆ๋‹ค, ๊ทธ๋ฆฌ๊ณ  ๋งค ํ”ฝ์…€๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค. ๊ตญ์ง€์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ๋ธ”๋Ÿฌ์˜ ์„ฑ์งˆ์€ ์‚ฌ์ง„๊ณผ ๋™์˜์ƒ์—์„œ์˜ ๋ชจ์…˜ ๋ธ”๋Ÿฌ ์ œ๊ฑฐ๋ฅผ ์‹ฌ๊ฐํ•˜๊ฒŒ ํ’€๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ํ•ด๋‹ต์ด ํ•˜๋‚˜๋กœ ์ •ํ•ด์ง€์ง€ ์•Š์€, ์ž˜ ์ •์˜๋˜์ง€ ์•Š์€ ๋ฌธ์ œ๋กœ ๋งŒ๋“ ๋‹ค. ๋ฌผ๋ฆฌ์ ์ธ ์›€์ง์ž„ ๋ชจ๋ธ๋ง์„ ํ†ตํ•ด ํ•ด์„์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์„ค๊ณ„ํ•˜๊ธฐ๋ณด๋‹ค๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ ‘๊ทผ๋ฒ•์€ ์ด๋Ÿฌํ•œ ์ž˜ ์ •์˜๋˜์ง€ ์•Š์€ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ณด๋‹ค ํ˜„์‹ค์ ์ธ ๋‹ต์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ๋”ฅ ๋Ÿฌ๋‹์€ ์ตœ๊ทผ ์ปดํ“จํ„ฐ ๋น„์ „ ํ•™๊ณ„์—์„œ ํ‘œ์ค€์ ์ธ ๊ธฐ๋ฒ•์ด ๋˜์–ด ๊ฐ€๊ณ  ์žˆ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ์‚ฌ์ง„ ๋ฐ ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์„ ๋„์ž…ํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ํ˜„์‹ค์ ์ธ ๋ฌธ์ œ๋ฅผ ๋‹ค๊ฐ์ ์œผ๋กœ ๋‹ค๋ฃฌ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋””๋ธ”๋Ÿฌ๋ง ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์ทจ๋“ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ชจ์…˜ ๋ธ”๋Ÿฌ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๊ฐ„์ ์œผ๋กœ ์ •๋ ฌ๋œ ์ƒํƒœ๋กœ ๋™์‹œ์— ์ทจ๋“ํ•˜๋Š” ๊ฒƒ์€ ์‰ฌ์šด ์ผ์ด ์•„๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ ๋””๋ธ”๋Ÿฌ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ง€๋„ํ•™์Šต ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ๋„ ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ณ ์† ๋น„๋””์˜ค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์นด๋ฉ”๋ผ ์˜์ƒ ์ทจ๋“ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ชจ๋ฐฉํ•˜๋ฉด ์‹ค์ œ์ ์ธ ๋ชจ์…˜ ๋ธ”๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ธฐ์กด์˜ ๋ธ”๋Ÿฌ ํ•ฉ์„ฑ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋‹ฌ๋ฆฌ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ์›€์ง์ด๋Š” ํ”ผ์‚ฌ์ฒด๋“ค๊ณผ ๋‹ค์–‘ํ•œ ์˜์ƒ ๊นŠ์ด, ์›€์ง์ž„ ๊ฒฝ๊ณ„์—์„œ์˜ ๊ฐ€๋ฆฌ์›Œ์ง ๋“ฑ์œผ๋กœ ์ธํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ตญ์†Œ์  ๋ธ”๋Ÿฌ์˜ ๋ณต์žก๋„๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์ œ์•ˆ๋œ ๋ฐ์ดํ„ฐ์…‹์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋‹จ์ผ์˜์ƒ ๋””๋ธ”๋Ÿฌ๋ง์„ ์œ„ํ•œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ตœ์ ํ™”๊ธฐ๋ฒ• ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๋””๋ธ”๋Ÿฌ๋ง ๋ฐฉ์‹์—์„œ ๋„๋ฆฌ ์“ฐ์ด๊ณ  ์žˆ๋Š” ์ ์ฐจ์  ๋ฏธ์„ธํ™” ์ ‘๊ทผ๋ฒ•์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋‹ค์ค‘๊ทœ๋ชจ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋‹ค์ค‘๊ทœ๋ชจ ๋ชจ๋ธ์€ ๋น„์Šทํ•œ ๋ณต์žก๋„๋ฅผ ๊ฐ€์ง„ ๋‹จ์ผ๊ทœ๋ชจ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ๋†’์€ ๋ณต์› ์ •ํ™•๋„๋ฅผ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ, ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง์„ ์œ„ํ•œ ์ˆœํ™˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋””๋ธ”๋Ÿฌ๋ง์„ ํ†ตํ•ด ๊ณ ํ’ˆ์งˆ์˜ ๋น„๋””์˜ค๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฐ ํ”„๋ ˆ์ž„๊ฐ„์˜ ์‹œ๊ฐ„์ ์ธ ์ •๋ณด์™€ ํ”„๋ ˆ์ž„ ๋‚ด๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋‚ด๋ถ€ํ”„๋ ˆ์ž„ ๋ฐ˜๋ณต์  ์—ฐ์‚ฐ๊ตฌ์กฐ๋Š” ๋‘ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•จ๊ป˜ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š๊ณ ๋„ ๋””๋ธ”๋Ÿฌ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ƒˆ๋กœ์šด ๋””๋ธ”๋Ÿฌ๋ง ๋ชจ๋ธ๋“ค์„ ๋ณด๋‹ค ์ž˜ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋กœ์Šค ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊นจ๋—ํ•˜๊ณ  ๋˜๋ ทํ•œ ์‚ฌ์ง„ ํ•œ ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์€ ๋ธ”๋Ÿฌ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ†ต์ƒ ์‚ฌ์šฉํ•˜๋Š” ๋กœ์Šค ํ•จ์ˆ˜๋กœ ์–ป์€ ๋””๋ธ”๋Ÿฌ๋ง ๋ฐฉ๋ฒ•๋“ค์€ ๋ธ”๋Ÿฌ๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜์ง€ ๋ชปํ•˜๋ฉฐ ๋””๋ธ”๋Ÿฌ๋œ ์ด๋ฏธ์ง€์˜ ๋‚จ์•„์žˆ๋Š” ๋ธ”๋Ÿฌ๋กœ๋ถ€ํ„ฐ ์›๋ž˜์˜ ๋ธ”๋Ÿฌ๋ฅผ ์žฌ๊ฑดํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฆฌ๋ธ”๋Ÿฌ๋ง ๋กœ์Šค ํ•จ์ˆ˜๋Š” ๋””๋ธ”๋Ÿฌ๋ง ์ˆ˜ํ–‰์‹œ ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ๋ณด๋‹ค ์ž˜ ์ œ๊ฑฐํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด์— ๋‚˜์•„๊ฐ€ ์ œ์•ˆํ•œ ์ž๊ธฐ์ง€๋„ํ•™์Šต ๊ณผ์ •์œผ๋กœ๋ถ€ํ„ฐ ํ…Œ์ŠคํŠธ์‹œ ๋ชจ๋ธ์ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์ ์‘ํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ œ์•ˆ๋œ ๋ฐ์ดํ„ฐ์…‹, ๋ชจ๋ธ ๊ตฌ์กฐ, ๊ทธ๋ฆฌ๊ณ  ๋กœ์Šค ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋”ฅ ๋Ÿฌ๋‹์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋‹จ์ผ ์˜์ƒ ๋ฐ ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์ ์œผ๋กœ ์ตœ์ฒจ๋‹จ ๋””๋ธ”๋Ÿฌ๋ง ์„ฑ๊ณผ๋ฅผ ์ฆ๋ช…ํ•œ๋‹ค.Obtaining a high-quality clean image is the ultimate goal of photography. In practice, daily photography is often taken in dynamic environments with moving objects as well as shaken cameras. The relative motion between the camera and the objects during the exposure causes motion blur in images and videos, degrading the visual quality. The degree of blur strength and the shape of motion trajectory varies by every image and every pixel in dynamic environments. The locally-varying property makes the removal of motion blur in images and videos severely ill-posed. Rather than designing analytic solutions with physical modelings, using machine learning-based approaches can serve as a practical solution for such a highly ill-posed problem. Especially, deep-learning has been the recent standard in computer vision literature. This dissertation introduces deep learning-based solutions for image and video deblurring by tackling practical issues in various aspects. First, a new way of constructing the datasets for dynamic scene deblurring task is proposed. It is nontrivial to simultaneously obtain a pair of the blurry and the sharp image that are temporally aligned. The lack of data prevents the supervised learning techniques to be developed as well as the evaluation of deblurring algorithms. By mimicking the camera image pipeline with high-speed videos, realistic blurry images could be synthesized. In contrast to the previous blur synthesis methods, the proposed approach can reflect the natural complex local blur from and multiple moving objects, varying depth, and occlusion at motion boundaries. Second, based on the proposed datasets, a novel neural network architecture for single-image deblurring task is presented. Adopting the coarse-to-fine approach that is widely used in energy optimization-based methods for image deblurring, a multi-scale neural network architecture is derived. Compared with the single-scale model with similar complexity, the multi-scale model exhibits higher accuracy and faster speed. Third, a light-weight recurrent neural network model architecture for video deblurring is proposed. In order to obtain a high-quality video from deblurring, it is important to exploit the intrinsic information in the target frame as well as the temporal relation between the neighboring frames. Taking benefits from both sides, the proposed intra-frame iterative scheme applied to the RNNs achieves accuracy improvements without increasing the number of model parameters. Lastly, a novel loss function is proposed to better optimize the deblurring models. Estimating a dynamic blur for a clean and sharp image without given motion information is another ill-posed problem. While the goal of deblurring is to completely get rid of motion blur, conventional loss functions fail to train neural networks to fulfill the goal, leaving the trace of blur in the deblurred images. The proposed reblurring loss functions are designed to better eliminate the motion blur and to produce sharper images. Furthermore, the self-supervised learning process facilitates the adaptation of the deblurring model at test-time. With the proposed datasets, model architectures, and the loss functions, the deep learning-based single-image and video deblurring methods are presented. Extensive experimental results demonstrate the state-of-the-art performance both quantitatively and qualitatively.1 Introduction 1 2 Generating Datasets for Dynamic Scene Deblurring 7 2.1 Introduction 7 2.2 GOPRO dataset 9 2.3 REDS dataset 11 2.4 Conclusion 18 3 Deep Multi-Scale Convolutional Neural Networks for Single Image Deblurring 19 3.1 Introduction 19 3.1.1 Related Works 21 3.1.2 Kernel-Free Learning for Dynamic Scene Deblurring 23 3.2 Proposed Method 23 3.2.1 Model Architecture 23 3.2.2 Training 26 3.3 Experiments 29 3.3.1 Comparison on GOPRO Dataset 29 3.3.2 Comparison on Kohler Dataset 33 3.3.3 Comparison on Lai et al. [54] dataset 33 3.3.4 Comparison on Real Dynamic Scenes 34 3.3.5 Effect of Adversarial Loss 34 3.4 Conclusion 41 4 Intra-Frame Iterative RNNs for Video Deblurring 43 4.1 Introduction 43 4.2 Related Works 46 4.3 Proposed Method 50 4.3.1 Recurrent Video Deblurring Networks 51 4.3.2 Intra-Frame Iteration Model 52 4.3.3 Regularization by Stochastic Training 56 4.4 Experiments 58 4.4.1 Datasets 58 4.4.2 Implementation details 59 4.4.3 Comparisons on GOPRO [72] dataset 59 4.4.4 Comparisons on [97] Dataset and Real Videos 60 4.5 Conclusion 61 5 Learning Loss Functions for Image Deblurring 67 5.1 Introduction 67 5.2 Related Works 71 5.3 Proposed Method 73 5.3.1 Clean Images are Hard to Reblur 73 5.3.2 Supervision from Reblurring Loss 75 5.3.3 Test-time Adaptation by Self-Supervision 76 5.4 Experiments 78 5.4.1 Effect of Reblurring Loss 78 5.4.2 Effect of Sharpness Preservation Loss 80 5.4.3 Comparison with Other Perceptual Losses 81 5.4.4 Effect of Test-time Adaptation 81 5.4.5 Comparison with State-of-The-Art Methods 82 5.4.6 Real World Image Deblurring 85 5.4.7 Combining Reblurring Loss with Other Perceptual Losses 86 5.4.8 Perception vs. Distortion Trade-Off 87 5.4.9 Visual Comparison of Loss Function 88 5.4.10 Implementation Details 89 5.4.11 Determining Reblurring Module Size 94 5.5 Conclusion 95 6 Conclusion 97 ๊ตญ๋ฌธ ์ดˆ๋ก 115 ๊ฐ์‚ฌ์˜ ๊ธ€ 117๋ฐ•

    Enhanced Deep Residual Networks for Single Image Super-Resolution

    Full text link
    Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge.Comment: To appear in CVPR 2017 workshop. Best paper award of the NTIRE2017 workshop, and the winners of the NTIRE2017 Challenge on Single Image Super-Resolutio

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Full text link
    Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I

    eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

    Full text link
    Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. Therefore, finding a further memory- and computation-efficient microarchitecture is crucial to speed up this coming revolution. In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. We apply a block-based inference flow which can eliminate all the DRAM bandwidth for feature maps and accordingly propose a hardware-oriented network model, ERNet, to optimize image quality based on hardware constraints. Then we devise a coarse-grained instruction set architecture, FBISA, to support power-hungry convolution by massive parallelism. Finally,we implement an embedded processor---eCNN---which accommodates to ERNet and FBISA with a flexible processing architecture. Layout results show that it can support high-quality ERNets for super-resolution and denoising at up to 4K Ultra-HD 30 fps while using only DDR-400 and consuming 6.94W on average. By comparison, the state-of-the-art Diffy uses dual-channel DDR3-2133 and consumes 54.3W to support lower-quality VDSR at Full HD 30 fps. Lastly, we will also present application examples of high-performance style transfer and object recognition to demonstrate the flexibility of eCNN.Comment: 14 pages; appearing in IEEE/ACM International Symposium on Microarchitecture (MICRO), 201

    Dynamic Video Deblurring Using a Locally Adaptive Blur Model

    No full text

    Attentive Fine-Grained Structured Sparsity for Image Restoration

    Full text link
    Image restoration tasks have witnessed great performance improvement in recent years by developing large deep models. Despite the outstanding performance, the heavy computation demanded by the deep models has restricted the application of image restoration. To lift the restriction, it is required to reduce the size of the networks while maintaining accuracy. Recently, N:M structured pruning has appeared as one of the effective and practical pruning approaches for making the model efficient with the accuracy constraint. However, it fails to account for different computational complexities and performance requirements for different layers of an image restoration network. To further optimize the trade-off between the efficiency and the restoration accuracy, we propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer. Extensive experimental results on super-resolution and deblurring tasks demonstrate the efficacy of our method which outperforms previous pruning methods significantly. PyTorch implementation for the proposed methods will be publicly available at https://github.com/JungHunOh/SLS_CVPR2022.Comment: Accepted to CVPR 202
    corecore